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ABSTRACT 
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TO THE EDUCATIONAL RESOURCES 

Abstract information center (eric) 

Recent exponential growth in the field of distance education has unfortunately not been matched with equal 
growth in a quality research base in terms of informing effective practice . In their 1996 analysis of the 
distance education literature, Mclsaac & Gunawardena state that "much research has taken the form of 
program evaluation, descriptions of individual programs, brief case studies , institutional surveys and 
speculative reports " (1996, p. 421). In the past few years , a particular form of study regarding the 
effectiveness of distance learning has enjoyed increased publication. Instructional technology professionals 
will recognize the methodology behind these reports as the media comparison study, newly revived for use 
in a distance education setting. Media comparison studies have historically formed the basis of much 
research in distance education (Mclsaac & Gunawardena, 1996; Schlosser <£ Anderson , 1994), but are 
lately becoming even more common. These studies are predictably plagued with the same design issues as 
their predecessors, however their " no difference " outcomes are being reported for politically different 
reasons. This paper details the origins of the media comparison study, its current use as an evaluation 
instrument in distance education, and recommendations for more stringent discrimination between 
research and evaluation in the field of distance learning. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



History of research in instructional technology 

Since the adoption of modem media for instructional purposes, innumerable attempts have been made to 
measure the effect that a given technology has on student achievement. Early in the history of electronic technologies 
like film and radio, educational researchers were driven to demonstrate that these revolutionary devices had a 
positive impact on learning (Saettler, 1968). The most common approach to attempt this investigation was the 
“media comparison” study, so named because of its strategy of comparing the learning outcomes of an experimental 
group receiving instructional content via one medium against the outcomes of a control group receiving the same 
content through a different medium. The most popular control group was the “traditional”, or lecture format class, 
with the instructor serving as the delivery medium. Even though comparison studies were fairly simple in design, 
variations of the research strategy exist. Most maintained the “media” as the independent variable, but some used the 
same instructional method (e.g., presentation via live lecture vs. presentation via videotape of the same lecture) while 
others utilized different instructional methods and different media (presentation via lecture vs. problem solving via 
computer-based instruction) (Ross & Morrison, 1996). 

The use of media comparison studies dates back to the origins of mediated instruction. Early researchers in 
audiovisual education worked diligently to control all aspects of such experiments so that results would maintain 
validity and comparisons would be fair. For example, McClusky & McClusky produced a “Comparison of six 
modes of presentation of subject matter”. Two trial experiments were conducted comparing six methods of 
presentation for the content depicted in two separate lessons: film only, slides with subtitles only, photographic 
pictures with subtitles only, and each medium with a supplemental question and answer session. Experiments were 
carefully controlled so participants viewed slide and print images for exactly the same time as such images appeared 
in the film. Recall of content was measured for each group using multiple-choice tests. The outcomes reflect some of 
the earliest evidence of what Russell (1997) calls the non-significant difference (NSD) phenomenon: “These 
comparisons show such inconsistent results that the film, slide, and print appear to possess no distinct advantage one 
over the other as far as these particular experiments are concerned” (Russell, 1997,p. 257). And, as history repeats 
itself, the tendency was to try this research design with the advent of any newer technological innovation, with 
consistent production of the same non-significant results. 

In 1983, Richard Clark, one of the most renowned critics of instructional technology research, detailed the 
problems inherent in media comparison studies and the improper assumptions about their outcomes. He stated that 
“these findings were incorrectly offered as evidence that different media were ‘equally effective* as conventional 
means in promoting learning. No significant difference results simply suggest that changes in outcome scores did not 
result from any systematic differences in the treatments compared” (Clark, 1983, p. 447). Clark emphasized that 
media are merely the delivery mechanisms for instructional content and do not impact the learning process. This 
perspective spurred great debate within the field of instructional technology (Clark, 1994a; 1994b; Kozma, 1991; 
1994a; 1994b; Jonassen, Campbell, & Davidson, 1994; Morrison, 1994; Reiser, 1994; Shrock, 1994; Tennyson, 



1994). While some contend that certain attributes of media can and do effect learning outcomes (Kozma, 1991; 
1994a, 1994b), Clark maintains that it is instructional method that influences learning, not the delivery medium 
(Clark, 1983; 1994a; 1994b). While this debate will undoubtedly continue, the futility of comparison studies to 
measure the impact of media on learning is consistently recognized in the field of instructional technology (Ross & 
Morrison, 1996). A chronological collection of hundreds of such experiments can be found at 
http://www2.ncsu.edu/oit/nsdsplit.htm (Russell, 1997). The database serves as a reminder to researchers that 
comparative designs will continue to provide predictable, non-significant outcomes. 

While the tendency is to compare learning outcomes via different media is to demonstrate the greater 
effectiveness of the newer medium, distance education comparison studies have given the argument a new twist. As 
evidenced in the following examples, the outcomes are now used to demonstrate that the distance-delivered 
instructional event is at least equal to the campus-based, face-to-face version. Kanner, Runyon, & Desiderato (1958) 
espoused this approach in less optimistic terms when summarizing their televised instruction research by stating that 
televised sessions were no more detrimental to classroom learning than face-to-face instruction. 

Recent research in distance learning 

As anyone involved in the support of distance education programming is aware, the resources required to 
deliver such programming to geographically and temporally dispersed learners are not inconsequential. Though cost- 
saving goals are often highlighted in plans for reaching new and different student markets, the front-end investments 
needed in course development, delivery infrastructures, teaching technologies, and support staff can be formidable 
(Keegan, 1996; Musial & Kampmueller, 1996). In analyzing the use of distance program evaluation data, Thorpe 
(1988) explains that administrators venturing into this new educational arena are expectedly anxious to use positive 
evaluation results to promote the desirable aspects of providing opportunities for remote student clientele. Increased 
access to such programming does not seem to serve as a satisfactory benefit for the implementation of distance 
education efforts. Stakeholders desire to demonstrate that participants in distance-delivered courses receive the same 
quality of instruction off-campus as those involved in the 4 traditional” classroom setting. What better way to 
determine the equality of experiences than to compare student achievement between the two groups? For example, 
according to Newlands & McLean (1996) ‘The calming of fears about the quality of distance learning has been 
assisted by evidence that, in terms of assessment, distance students perform as well as internal students...” (p. 289). 
One of the most prominent early works in teleconferencing training, Bridging the Distance (Monson, 1978), employs 
a collection of comparison studies for this very reason — to ensure soon-to-be distance educators that off-campus 
students will be just as academically successful as their campus-bound counterparts. 

The guaranteed validation of equality of learner achievement has led to use of comparison studies about 
distance education in almost every imaginable discipline. The research design remains exactly the same as previously 
compared mediated experiences, the on-campus students serve as the control group, since their experience is 
unmediated, while the distant students provide the treatment group. For example, "The 38 South Carolina campus 
students, considered the control group in this report, completed either all or the majority of their degree programs in 
traditional classroom settings on the Columbia campus" (Douglas, 1996, p. 878). Repeatedly, outcomes are 
embodied in statements like “there was no real difference between grades of in-class and ITV students” (Fox, 1996, 
p. 362). Some authors offer additional analysis such as, “There were no differences between pre-and post-tests 
(which measure increases in knowledge) across sites. This demonstrates that the program was effective in increasing 
knowledge...” (Reiss, Cameon, Matthews, & Shenkman, 1996, p. 350). Behind conclusions such as these is the 
conviction that distance learners are engaged in an equally rigorous instructional experience even though they are not 
participating in campus -based education. General unawareness of the history of instructional technology research, 
especially the out-dated notion of the effectiveness of comparison studies, is exemplified in a recent sociology 
journal through the following excerpt: 

Of greatest concern to us is the absence of well-crafted comparison studies, that examine not simply student 
attitudes towards distance learning, but the actual knowledge and skills that students acquire from televised 
teaching. Ideally, this demonstration could involve the same instructor teaching two sections of the same course 
during the same school term, one exclusively by live instruction and the other only by distance learning (Thyer, 

Polk, & Gaudin, 1997, p. 367). 

Instructional technology research journals are not exempt from the publication of such studies. In a 1996 
issue of Educational Technology Research & Development , Whetzel, Felker, and Williams began their article “A 
Real World Comparison of the Effectiveness of Satellite Training and Classroom Training” with an analysis of 
research regarding the effectiveness of televised instruction, summing up the mixed results by re-stating Clark’s 
(1994) view that “...any necessary teaching method can be delivered by many media with similar learning results 
(Whetzel, Felker, & Williams, 1996, p. 6). However, their study used a research design that compared the 
achievement of the on-site versus distant students: 
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-For the two courses in which satellite and classroom training -were compared, an analysis of -covariance 
(ANCOVA) was used to compare delivery modes for nonequivalent groups (satellite and classroom participants), 
using pretest score as the covariate and posttest score as the dependent variable (Whetzel et al., 1996, p. .10). 

The use of media comparison studies in distance learning is not limited to higher education settings. Barry 
& Runyan (1995) assembled a “Review of Distance Education Studies in the U.S. Military” in which they cited 
eight “empirical studies that compared student achievement in distance learning courses to achievement in 
comparable resident courses” (p. 43). Their closing statement embraced the reliable non-significant difference results 
as proof that the U.S. military could safely continue to invest in the expansion of distance learning initiatives. Due to 
the expanded publication of studies like these, a distinction must be made between valid research in distance 
education and evaluation efforts for distance program confirmation. 

Methodology analysis 

Although we often use the terms interchangeably, as we have noted earlier, evaluation and research are not 
the same although they may share many methods. 

Evaluation is practical and concerned with how to improve a product or whether to buy and use a product 
Studies that compare one program or media against another are primarily evaluation. Evaluations seeks to find the 
programs that “work” more cheaply, efficiently, quickly, effectively, etc. 

Research, on the other hand tends to be more concerned with testing theoretical concepts and constructs or 
with attempting to isolate variables to observe their contributions to a process or outcome (Moore, Myers, & Burton, 
1994, p. 35) 

Research studies generate hypotheses from theory. In the case of so called “hard” sciences such as physics, 
these predictions are usually a quantitative point value, magnitude, or form function which become point predictions 
(Meehl, 1967). Point predictions become easier to refute as measurement improves, that is, the better the 
measurement the more the hypothesis is exposed to rejection. (Indeed many replications involve changes in measures 
rather than “study” conditions). Rejection, in the case of a point prediction, is a modus tollens refutation (i.e. T -> E, 
not E, therefore not T) (Popper, 1968). Research in the “soft” sciences however, does not, indeed cannot, test point 
predictions. Rather, the logical compliment of the predicted outcome, the point -null hypothesis (there is no 
difference between, for exanple, two groups of participant’s mean scores) is tested. Interestingly, the effect of this 
manipulation is that the theoretically derived hypothesis is not subjected to true modus tollens - it cannot be 
logically refuted (Meehl, 1967). The null hypothesis can be accepted but not embraced as true, yet it’s acceptance 
does not refute the core hypothesis (Orey, Garrison, & Burton, 1989). For this reason (among others such as the 
“loose” connection between theory and variables), it is often the case that “soft” science researchers commonly 
speculate about what could have occurred when the null is accepted. Such post hoc speculation is permissible 
because the core hypothesis cannot be truly refuted. Such theories never actually die, researchers just sort of “.. .lose 
interest in the thing and pursue other endeavors” (Meehl, 1978, p. 807). Theory-based research studies are not good 
candidates to be “repurposed” as “negative” evidence that something didn’t happen. 

A “second level” problem relates to what Reeves (1993, 1995) among others has referred to as 
pseudoscience. Reeves offers nine characteristics of pseudoscientific studies and estimates that perhaps 60% - 70% 
of “empirical-quantitative” studies in the major instructional technology research journals suffer from two or more of 
these flaws. The bulk of these weaknesses such as failing to link the study to a robust theory, poor literature review, 
weak treatment implementation, measurement flaws, inadequate sample size, and poor analyses (Reeves, 1993; 

1995)) bias the research towards not finding a statistically significant difference (Burton & Magliaro, 1988). In other 
words, bad science and bad designs can produce no differences. 

The last two comments relate to both theoretical and evaluation comparative studies. The first comment is 
that when we test a hypothesis, we test not just the variable of interest, we also test the assumption of ceteris paribus 
(all things are assumed to be equal except for those conditions that are actually manipulated) (Orey et al., 1989). It is 
in a fact a “folk” version of ceteris paribus which researchers often resort to when they explain the failure to find a 
predicted statistically significant difference by resorting to differences in the sample or task that were outside of what 
was being manipulated. To the extent that ceteris paribus is not true, the results of the study (in either “direction”) 
are suspect. With test such as ANOVA and ANCOVA, this relates to the assumption of homogeneity of variance. 
This assumption is often not tested because the tests are assumed to be robust to such violations (Thompson, 1993). 
Unfortunately, it does not appear to be true with ANCOVA (see, e.g. Keppel & Zedeck, 1989) and may not be true 
of ANOVA either (e.g. Wilcox, Charlin, &Thompson, 1986). 

Second, in research or evaluation, measurement is a problem. Faulty measurement was another of Reeves’ 
(1993, 1995) indicators of pseudoscience but in evaluation, particularly as it relates to “real world” educational 
contexts such as found in distance or distributed education, the problem is often more insidious. The burden of 
showing reliability and validity for any test not in general use is always upon the researcher (Burton & Magliaro, 
1988). Many studies related to distance learning use “teacher-made achievement tests” which may, or may not have 
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reliabilities or validates established. Perhaps worse than using a test which produces scores which are largely error 
or unrelated to the content however, is the fact that such tests are often used a part of a graded exercise. Graded 
exercises may cause people who tend to make A’s and B’s simply work harder to overcome any problem in the 
instruction. The potential lack of adequate tests are measurement problems. The potential difference in effort are 
violations of ceteris paribus. 

In terms of statistics, many current researchers have argued that null hypothesis testing should be eliminated 
altogether (e.g. Carver, 1993) while others such as Thompson (1996) and Robinson & Levin (1997) would like to 
see such tests supplemented. Although there are differences, both camps tend to agree on two things: effect size and 
replication. Effect sizes should always be reported according to Thompson (1996) but, as Levin (1993) points out “to 
talk of effect sizes in the face of results that are not statistically significant does not make sense” (p. 379) 

Replication refers to repeating essentially the same experiment multiple times. No finding should ever stand 
on a single study. It is worth noting however, that while some believe that such experiments can inform social 
science theory (e.g. Phillips, 1992), others (e.g. Salomon, 1991) believe that no matter how well constructed 
experimental and similar research approaches are, "they are based on a number of assumptions, none of which fit the 
study of whole classroom cultures” (p.13). We assume this would include distant classrooms and distributed cultures. 

Finally we offer the following caveat related to accepting NSD studies as proof. Establishing a null is very 
much like the not guilty assumption of the US legal system. In both cases, the burden of proof is on overturning the 
assumption based on evidence. But failure to reject the null hypothesis means just that and nothing more; just as a 
legal finding of not guilty does not mean innocent. As Carver (1978) puts it: 

What is the probability of obtaining a dead person given that the person was hanged? Obviously, it is very 
high, perhaps .97 or higher. Now, let us reverse the question. What is the probability that a person has been hanged, 
given that the person is dead? This time the probability will undoubtedly be very low, perhaps .01 or lower. No one 
would be likely to make the mistake of substituting the first estimate (.97) for the second (.01); that is to accept (.97) 
that a person has been hanged given that the person is dead. Even though this seems an unlikely mistake, it is exactly 
the kind of mistake that is made with interpretations of statistical significance testing (pp. 384 - 385). 

Evaluation versus Research in Distance Education 

While it may have been the intent of the investigators of comparison studies cited herein to create 
generalizable findings, the motivation behind the studies were most likely to obtain information about the success of 
local distance education programs. Appropriate uses of media comparisons for distance program evaluation are 
detailed as follows, as well as alternative methods and exemplary' models. 

Evaluation in distance education 

Program evaluations in education frequently look to achievement as a measure of success, and sometimes 
through the use of comparison studies as an evaluation method. Smith & Glass (1987) call such inquiries 
comparative evaluations, as the studies assess the effectiveness of a product or program by pitting it against an 
alternative product or program that is designed to meet the same needs. However, such comparisons work best if the 
treatment group and control group are similar in identity and can be randomly assigned (Fitz-Gibbon & Morris, 

1978), which is usually not the case in distance education. Participants in higher education distributed courses are 
typically non-traditional learners who cannot attend class at the originating institution, hence their enrollment in 
distance programs. Not only are these students different demographically, but they also possess other characteristics 
which vary from traditional college attendees, such as prior knowledge and experience and level of motivation 
(Verduin & Clark, 1991). In any case, if comparative evaluations can be designed to represent comparable groups of 
learners, the results of such studies must be published as local findings instead of generalizable contributions to the 
theoretical base of distance education. 

Although student achievement is one common measure of distance program success, Keegan (1996), 
Holmberg (1989), and Thorpe (1988) recommend that program evaluators collect and report a number of other types 
of data to give the most exhaustive description of a distance education program. Saveyne & Box (1995) suggest the 
collection of information with regard to instructional design, participant attitudes (student and instructor), and 
implementation issues, such as technical quality, student support, etc. Keegan (1996) proposes a four-point 
evaluation scheme for distance programs which assesses 1) the quantity of learning achieved such as the number of 
new students served, attrition rates, time to program completion, etc.; 2) the quality of learning achieved measured 
by the effectiveness of the program in facilitating desired learning outcomes; 3) the status of the learning achieved 
indicated by the transferability of program coursework, recognition of degrees by employers or graduate institutions; 
and 4) the relative cost of the learning achieved acquired through the analysis of the cost-efficiency of distance 
programs relative to conventional programs, as well as the cost benefits of the distance program versus traditional 
programs (1996, pp. 186-188). The case studies provided by Keegan (1996) are mindful exampies of distance 
program evaluations, as they provide a thorough portrayal of program efforts through analysis of the aforementioned 
indicators. Another effective distance evaluation model can be found in the Flashlight Project (Ehrmann, 1994), an 




effort by the Annenburg/Center forPublic Broadcasting to beip Institutions of higher education assess their uses of 
instructional technology for distributed learning. If the intentions of investigators are to determine the effectiveness 
of distance education programs, these evaluation reports serve as exemplars due to their comprehensive approach. 

Research in distance education 

Those involved in the design, development, and implementation of distance education programs have 
access to a wealth of data from which to conduct valid research. For a summary of the existing literature base in 
distance education, see Mclsaac & Gunawardena (1996) and Schlosser & Anderson (1994). Interestingly, both 
pieces highlight the need to move away from the continued use of media comparison studies toward more productive 
lines of inquiry. If researchers are driven to investigate the effects of delivery media, perhaps they will heed Reeves’ 
(1995) advice and design instructional technology studies that will indeed improve education. Examples of research 
that have served to inform the development of effective distance learning experiences can be seen in Garrison (1990) 
and Gunawardena, Campbell Gibson, Dean, Dillon, & Hessmiller (1994). Garrison {1990) analyzed the ability of 
audioconferencing to provide necessary levels of interaction for feedback as well as for student satisfaction. Distance 
delivery media also afford varied levels of social presence, as found by Gunawardena, et al. (1994). Knowing how 
media convey information and allow individuals to interact are important considerations in the design of distance 
programming. Indeed, more researchers should leverage their involvement in distance education experiences to 
contribute to the knowledge base of the field. Mclsaac and Gunawardena (1996) indicate that what is needed is "rich 
qualitative information or programmatic experimental research that would lead to the testing of research hypotheses” 
(p. 421). While determining the efficacy of distance programs is important to all stakeholders, investigators must 
ensure that such inquiries begin with valid questions and that the intentions behind the study are well-defined. 
Concurrently, it is equally important that editors of professional journals also distinguish between research and 
evaluation in distance education and communicate that distinction to the authors of their manuscripts. 
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